Selecting Interesting Articles Using Their Similarity Based Only on Positive Examples

نویسندگان

  • Jirí Hroza
  • Jan Zizka
چکیده

The task of automated searching for interesting text documents frequently suffers from a very poor balance among documents representing both positive and negative examples or from one completely missing class. This paper suggests the ranking approach based on the k-NN algorithm adapted for determining the similarity degree of new documents just to the representative positive collection. From the viewpoint of the precision-recall relation, a user can decide in advance how many and how similar articles should be released through a filter.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A note on "An interval type-2 fuzzy extension of the TOPSIS method using alpha cuts"

The technique for order of preference by similarity to ideal solution (TOPSIS) is a method based on the ideal solutions in which the most desirable alternative should have the shortest distance from positive ideal solution and the longest distance from negative ideal solution. Depending on type of evaluations or method of ranking, different approaches have been proposing to calculate distances ...

متن کامل

Similarity-Based Approach for Positive and Unlabeled Learning

Positive and unlabelled learning (PU learning) has been investigated to deal with the situation where only the positive examples and the unlabelled examples are available. Most of the previous works focus on identifying some negative examples from the unlabelled data, so that the supervised learning methods can be applied to build a classifier. However, for the remaining unlabelled data, which ...

متن کامل

Kobe University at TRECVID 2009 Search Task

In TRECVID 2009 search task, we have developed a method which defines any interesting topic from examples provided by a user, especially, positive and negative examples. Specifically, considering a large variation of features in a topic, we use “rough set theory” which defines the topic as a union of subsets. In each subset, some positive examples can be correctly distinguished from all negativ...

متن کامل

A revised Fuzzy - PROMETHEE method , using Fuzzy Distance and Similarity Measures

PROMETHEE refers to a collection of methods of ranking in the field of multi-criteria decision making. These methods are characterized by conceptual simplicity and practical applicability. However, the nature of phenomena involving decision-making in real world leads us to use fuzzy method of preference ranking. The most common criticism on mathematical ranking procedures is that they tend to d...

متن کامل

Strategy for research of new pharmacologically active molecules from plants for the treatment of pathologies

Herbal medicine, botanical medicine, phytotherapy, alternative medicine or, complimentary medicine are terms used to describe the science of using plant-based materials to treat specific symptoms or diseases. People have strong belief that natural remedies are perfectly safe. Because we have strong ties to traditional culture we use herbs and spices on daily basis. Plants are an abundant natura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005